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We study first passage statistics of the Polya urn model. In this random process, the urn contains 
two types of balls. In each step, one ball is drawn randomly from the urn, and subsequently placed 
back into the urn together with an additional ball of the same type. We derive the probability Gn 
that the two types of balls are equal in number, for the first time, when there is a total of 2n balls. 
This first passage probability decays algebraically, Gn ~ n~ , when n is large. We also derive the 
probability that a tie ever happens. This probability is between zero and one, so that a tie may 
occur in some realizations but not in others. The likelihood of a tie is appreciable only if the initial 
difference in the number balls is of the order of the square-root of the total number of balls. 

PACS numbers: 02.50.Cw, 05.40.-a, 02.10.Ox 
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I. INTRODUCTION 

Urn models play a central role in probability theory 
and combinatorics [l|, |2|. Since the balls can represent 
anything from atoms to biological organisms to humans, 
urn models are widely used in the physical, life, and social 
sciences |3|. 

In this paper, we investigate the classic Polya urn 
model [3t6|. This urn process is a type of birth process, 
and it is useful for modeling the spread of infectious dis- 
eases, population dynamics, and evolutionary processes 
in biology |5l.[7Hl0l|. Furthermore, this stochastic process 
is a branching process [ll|, and it is used to model data 
structures in computer science |13 - [l4| . From the myriad 
of other applications, we mention decision making [15l |. 
reinforcement learning ,16], and Internet browser usage 
|17| . We also note that the Polya urn model is a limiting 
case of earlier urn models investigated by Laplace (ISf . 
Markov ^, and Ehrenfest ^. 

The Polya urn model exhibits rich and interesting phe- 
nomenology that includes strong influence of the initial 
conditions, large realization-to-realization fluctuations, 
and substantial finite-size corrections [a, ^Mi l22|. In this 
study, we obtain the first passage properties [23| of the 
Polya urn model, and contrast these with the first pas- 
sage characteristics of an ordinary random walk [23, [2J] . 

In the Polya urn model, there are two types of balls, 
black and white. In a basic step, one ball is selected 
randomly from all balls in the urn. This ball is then 
returned to the urn together with an additional ball of the 
same color. Starting with a given configuration of balls, 
the number of balls increases indefinitely by repeating 
this step ad infinitum. Thus, a configuration {B, W) with 
B black balls and W white balls evolves according to 
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FIG. 1: The urn process as a trajectory on a two-dimensional 
lattice (bold line) where bullets indicate intermediate stages 
of the trajectory. Exit from the B > W region is equivalent 
to this trajectory reaching the diagonal (broken line). 



tie is reached, for the first time, when there are n balls 
of each type decays algebraically 
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We investigate first passage properties of the urn pro- 
cess (figure [T]). We find that the probability G„ that a 



for large n. This asymptotic behavior holds for arbitrary 
initial conditions. We also show that the total exit prob- 
ability, that is, the probability that a tie is ever reached, 
is less than one. Hence, an initial imbalance in the num- 
ber of balls can be locked in forever. We study how the 
total exit probability depends on the initial condition and 
find that it is appreciable only when the imbalance in the 
number of balls is of the order of square-root of the total 
number of balls. 

The rest of this paper is organized as follows. We de- 
rive the first passage probability in section II. We then 
obtain the exit probability by summing the first passage 
probability (section III). We discuss the extreme cases 
of nearly-maximal and extremely small exit probabilities 
(sec. IV), and then use these limiting behaviors to es- 
tablish scaling properties of the exit probability (sec. V). 
We generalize the results to near-ties in section VI, and 
conclude in section VII. 



II. THE FIRST PASSAGE PROBABILITY 

Our goal is to quantify the first passage process, il- 
lustrated in figure [TJ using the first passage probabil- 
ity and the total exit probability. For the initial con- 
dition {B,W) = {b,w) where, without loss of generality, 
black balls are in the majority, b > w, the first pas- 
sage probability Gn{b,w) is the likelihood that a tie is 
reached, for the first time, when {B,W) — {n,n). In 
other words Gn{b,w) is the probability that the initial 
imbalance holds, B > W, if and only ii W < n. The 
total exit probability E{b, w) is the probability that a tie 
is ever reached. 

These first passage characteristics are of interest in a 
variety of contexts. For example, in growth of bacte- 
rial colonies [2^, [2^ , when bacteria proliferate without 
resource limitations, the exit probability measures the 
likelihood that the minority species eventually overtakes 
the majority species. In the context of a branching pro- 
cess, first passage statistics quantify the likelihood that 
two branches of a tree reach perfect balance. 

As a preliminary step to finding the first passage 
probability, we obtain the likelihood that the system 
reaches configuration (B,W) = {m,n) starting from 
{B^W) = {b,w). Let's consider, for example, the transi- 
tion (1,1) —> (3,3) where one possible path is 

(1,1)^(1,2)^(1,3)^(2,3)^(3,3). 

The likelihood of this path is 



1 2 1 2 (1-2) •(1-2) 
-x-x-x- = ^ — ^ 

2 3 4 5 2-3-4-5 



(3) 



There are 



6 distinct routes from (1, 1) to (3, 3) and 



they all have the same same probability ([3]). 

In general, all paths from configuration (6, w) to con- 
figuration {m, n) have the same probability 

[b{b + 1) • • • (m - 1)] • [w{w + 1) • • • (n - 1)] 
{b + w){b + w + l)---{m + n-l) 

We rewrite this probability using factorials 

(m-1)! ^(ji^ll^ {b + w-iy. 



(6-1)! (u;-l)! (m + n-1)!' 

The total number of distinct paths from (6, w) to (to, n) 
equals the binomial (™^mZb~™)- Hence, the transition 
probability P that, starting from configuration {b, w), the 
system reaches configuration (m, n) is [3-6] 



P 



771 — l\ /n — 1 
5-1 j [w-1 



m + n — 1 
b + w -1 



(4) 



times the total number of such paths. This result can 
be established using the reflection principle [i|. Since 
all paths from (6, w) to {n, n) are equiprobable, the first 
passage probability is simply Gn{b,w) = 2n~b"-w P- ^^ 
substituting m = n into Eq. (|4]), we obtain our first main 
result, the first passage probability 



G„(6, w) = 



b — w fn — l\ ( n — 1\ /2n — 1 



b ^ w\b — \ j \w — \) \b ^ w 



This quantity decays algebraically, 

Gn{b,w) ~A{b,w)n-^, 



(5) 



(6) 



in the asymptotic limit n ^ bjW. The proportionality 
constant in ([6]) is 

For the special case {b,w) = (2,1), the first passage 
probability ([S]) is simply 



Gn(2,l) 



(2n-3)(2n-l)' 



(7) 



The probability that a tie is ever reached equals the sum 
j-g-|--|g-|--f= + --- = i, and hence, there is a finite chance 
that the initial imbalance in the number of balls is main- 
tained forever. This behavior is different than that of a 
one-dimensional random walk. In an ordinary random 
walk, the two elementary transitions in ([TJ occur with 
probability 1/2, and the first passage probability decays 
algebraically, G„ ~ n~^/^, for large n. Yet, the exit prob- 
ability equals one, and the random walk is guaranteed to 
reach the diagonal B = W. Thus, the one-dimensional 
random walk is recurrent, but the Polya urn process is 
transient. 



III. THE EXIT PROBABILITY 

The exit probability En{b,w) is the likelihood that 
starting from configuration {b,w), a tie happens by the 
time the urn contains 2n balls. The exit probability fol- 
lows from the first passage probability. 



E„{b,w) 



b<j<n 



G,{b,w) 



(8) 



The lower limit reflects that the quickest tie occurs when 
{B, W) = (6, b). We are especially interested in the total 
exit probability, E{b,w) = lim„_j.oo £„(&, w). From the 
identity £„ — £„_i(6, w) = Gnib^w) and equation (|6]), we 
conclude the asymptotic behavior 



E{b,w) 



In particular, the probability distribution is flat [J, |5| 
P = — r^^ — T, for the initial condition (6, w) — (1, 1). 

The number of paths from (b, w) to (n, n) that reach when n ^ b, 
the diagonal B — W only at the end point equals ■^_ 



b, w) ~ A{b, w) n 



(9) 



-b — w 



In particular, the quantity 
£„(2, 1) = ^ij^ that is the sum of ©, agrees with dH). 



To evaluate the total exit probability E{b,w), 
we introduce the shorthand notation Ck{b,w) = 
Gb+k{b,w). With this notation, Eq. (|8]) becomes 



E{b,w)^j:k>oCk{b, 



W) 



and 



Ckib,w) 



% + k-l 
w \ 6—1 



b + k-l 
w — 1 



w 



2b + 2k-l 
b + w 



(10) 



which is obtained by substituting n ^ b + k into ([5]) . In 
particular, the quantity 



Co{b,w) 



T{b)T{b + w) 
r{2b)T{w) 



is the probability that a tie occurs as quickly as possible. 
In terms of the quantities Ck(b, w), the total exit prob- 
ability E{b, w) = J2j>b ^j(b, w) equals 



E{b,w)=Y,Ckib,w). 



(11) 



fc>0 



We now evaluate the ratio of two consecutive first passage 
probabilities 



{k + b){k+^)ik 



b—w + l \ 
2 '' 



Ck+iib,w) 

Ck{b,w) ~ {k + l){k + b+^){k + b-w + l)' 



(12) 



Given these ratios, the exit probability can be expressed 
in terms of the hypergeometric function |27| 



E{b, 



_ r(b)r(b+tu) 
~ r{2h)r{w) 



F{b,!^ 



b—w b—w-\-l 



;6+i 6-w + l;l) 



(13) 

This closed form expression is our second main result. As 
expected, E{b, &) = 1 and E{b, 0) = 0. 

We also note that the exit probability satisfies the com- 
pact recursion relation 



E{b,w) 



■E{b+l,w) 



w 



E{b,w + l), (14) 



for all b ^ w. The boundary conditions are -E(&, b) = 1 
and E{b, 0) = 0. This recursion follows directly from the 
definition of the stochastic process ([T]), and is reminiscent 
of the recursion equation for an ordinary random walk 
E{b,w) = ^ E{b + l,w) + ^ E{b,w + 1) [23. We use this 
recursion to analyze extremal properties of E{b, w) in the 
next section. 



IV. EXTREMAL BEHAVIOR 

Intuitively, we expect that when b is fixed, the exit 
probability increases monotonically with w. The exit 
probability is largest when the number of balls is bal- 
anced, E{b,b) — 1, and conversely, the exit probabil- 
ity is smallest when the initial imbalance is maximal, 



E{b, 0) = 0. We now discuss the extreme cases of very 
small and nearly-maximal exit probabilities, respectively. 
When w ~ 1, the exit probability decays exponentially 
with the total number of balls, 



E{b,l) ^2^- 



(15) 



To obtain this result, we note that when w — 1, two of the 
arguments of the hypergeometric function in (J13p coin- 



cide and hence, E{b, 1) 



r(b)r(b+i) 
r(26) 



F 



^2' 2 ' 



^;i) 



We obtain the expression (fT5|) using the Gauss identity 
for the hypergeometric function 



F{x,y;z;l) 



r(z 



2/)r(z) 



T{z-x)r{z-y)' 



(16) 



and the following two identities for the Gamma function, 
T{x -t- 1) = xr{x), and r(i)r(2x) = 22^-ir(a;)r (s + 5) . 
By substituting E{b, 1) = 2^"** into the recursion (|14l) . 
we have E{b, 2) = (b + 2)2^''. Similarly, we obtain 
E{b, 3) = (6^ + 5b + 8)2^''"2 ^y substituting E{b, 2) into 
(fH)). In general, the exit probability has the form 



E{b,w) 



UUb) 
(u;-l)! 



~,2-w-b 



(17) 



where Uw{b) is a polynomial of degree w — 1 in the vari- 
able b. From equation ([T4|) . these polynomials satisfy the 
recursion 

U^+i{b) = 2{b + w)U^ib)~bU^ib+l). (18) 

Starting with the boundary condition, Ui{b) = 1, we have 

(1 w = l, 

b + 2 w = 2, 

62 + 56 + 8 w = 3, (19) 

6^ + 962 + 326 + 48 w = 4, 

6^ + 1463 + 8362 + 2626 + 384 w = 5. 



UUb) 



Since the coefficient of the dominant term in Uw(b) equals 
one, the exit probability decays exponentially with the 
total initial population 



E{b,w) 



-)2-b- 



(w-iy. 



(20) 



when w is finite and 6 -> 00. 

To analyze the behavior in the opposite limit of nearly- 
maximal exit probabilities, we consider the special case 
w = 6 — 1 where the ratio ([T^ simplifies as follows 



Cfe-( 



(fc + 6)(fc + |) 

Ck ~ (fc + 2)(fc + 6+i) 



(21) 



We now shift the index of the first passage probability by 
one, -Dfc+i = Cfc, with Dq = — 1, and then evaluate the 



sum nil) to find E{b, 6 - 1) = 1 - F (6 - 1, 



-i;6- 



1 




FIG. 2: The scaling function $(z) versus the scaling variable 
z. Shown is the exit probability E{b,w), evaluated using the 
exact expression (|13l) . for three different values oi N = b + w. 



Further, we express the exit probability through Gamma 
functions by using the identity (fT6l) . 



Eib,b-1) = 1 



r(&-i/2) 
r(fe)r(i/2) 



(22) 



The exit probability increases monotonically with b: 
E{b,b- 1) = 1/2,5/8,11/16 for b = 2,3,4. Moreover, 
ties become practically certain, E{b, 6 — 1) ~ 1 — l/\/nb, 
in the limit b -^ oo. 

Along the same lines, we evaluate E{b, b — q) by 
substituting the form (|22|) and the boundary condition 
E{b, b) = 1 into the recursion (fM)) . 



Eib,b^q) = l- 



r(6)r(i/2) 



-5/3 
'6-3/2 
b-2 
-3/2 



9 = 1, 

9 = 2, 
9 = 3, 
9 = 4. 



(23) 



From these examples, we conclude -E(&, b—q) 
when q is finite and & — )■ oo. In other words. 



E{b,w) ~ 1 



2 16 



w\ 



nVb'- 



w 



l—q/vTrb 



(24) 



when the initial imbalance |5 — w| is fixed and the total 
number of balls b + w diverges. We used the symmetry 
E{x, y) — E{y, x) so that ([24]) applies for both b > w and 
w > b. This equation implies that a tie is nearly certain 
whenever the initial discrepancy in the number of balls 
is much smaller than the square-root of the total number 
of balls. Otherwise, the exit probability is substantially 
reduced. 



V. TYPICAL BEHAVIOR 

The asymptotic behavior (|24p suggests that the exit 
probability is function of a single variable when the total 




FIG. 3: The tail of the scaling function ^{z). Shown is the 
scaling function ^{z) versus z'^. 



number of balls is very large, 
function $(z), given by 



Specifically, the scaling 



E{b,w) :^^{z) with z == 



\b-w\ 
\/b + w 



(25) 



quantifies the exit probability in the limit b + w — > oo. 
Numerical evaluation of the exit probability (J13p confirms 
this scaling behavior (figure [2]). 

The scaling function is monotonically decreasing be- 
cause a larger initial imbalance implies a smaller exit 
probability. The small argument behavior of the scal- 
ing function, $(2;) ~ 1 — W — z, follows immediately from 

([24l) . The large argument behavior follows from the ex- 
ponential behavior E{b, 1) ^ 2~'', see (fT5|) . Indeed, when 
w = 1 and b is very large, the scaling variable is 2 ~ v6, 
and hence the tail of the scaling function must be Gaus- 
sian, log $(2) ^ —2^. Numerical evaluation of the exact 
solution supports this heuristic prediction (see figure [3]). 
The scaling form (pS)) implies that the likelihood of a 
tie is appreciable only when the initial population differ- 
ence, A = |6 — uij, is of the same order as the square- root 
of the total population, N = b + w, that is. 



A- 



A^. 



(26) 



A tie is nearly certain when the discrepancy is small, 
A <C \/N, but extremely rare when A ^ \/N. 



VI. NEAR TIES 

There are a number of generalizations of the first pas- 
sage process discussed above. Two natural questions are: 
(i) what is the probability that the ratio between the ma- 
jority population and the majority population is always 
above a fixed threshold and (ii) what is the probability 
that the difference between the two populations is always 
above a fixed threshold. In this section, we address the 
latter problem. 



We define Gn{b,w;d) to be the first passage proba- 
bility that starting with configuration {b^w), the differ- 
ence B — W > d a and only ii W < n. In other words 
GnibjWjd) is the probability that there are at least d 
more black balls throughout the evolution and more- 
over, this condition is violated for the first time when 
{B, W) ~ {n + d.n). We obtain the first passage proba- 
bility 



Gn{b,w;d) 



/n + d-l\ fn-l 
b-w-d\ b-l )[w-l 



b + w 



2n + d-l 
b + w 



, (27) 



by multiplying the probability Q for transitioning from 
(6, w) to (n -I- d, n) with the fraction ^n+d-b-w °^ these 
paths that do not cross the line i? = W^ -I- d [i|. Of 
course, this expression matches ([S]) when d = 0. Again, 
the asymptotic behavior in the limit n — >■ oo is G„ ^ n~^. 
The exit probability E{b, w; d) is the probability that 
the line B — W + d is reached at least once during the 
evolution. By repeating the steps leading to P^ , we find 
the exit probability in terms of a higher-order hypergeo- 
metric function 



E{b,w;d) 



-i^ '-^ ^i^(Ci,C2,C3,C4; 61,62, 63;!) 



r(2b~d)r{w) 
The corresponding arguments are 

L ^_ b — w~d 

61 = 0+ -V") ^2 



(28) 



61 ^ b, 62 = 2 ' "^3 

5+i^, e^^b-w-d 



b-w-d+l 
2 



1, 



64 
63 : 



-b 
b^ 



For near-ties, d — 1, there are many similarities with 
E{b,w) of Eq. ([T3l) . For example, the exit probability 
decays exponentially when the initial imbalance is maxi- 
mal, 



Eib,l;l) 



,1-6 



5-1 



(29) 



Additionally, the exit probability is close to one when the 
initial imbalance is minimal. 



E{b,b -2;l) = 1 



1 



T{b- 1/2) 



(30) 



Thus, the behavior E{b, 6-2; 1) 
when 6 — >■ 00. 



2(6-1) r(6)r(i/2) 

1 — 1 / y/iTb is recovered 



VII. DISCUSSION 

In summary, we obtained first passage characteristics 
of the Polya urn process as a function of the initial con- 
dition. The first passage probability that a tie is reached 
for the first time when there are 2n balls decays alge- 



braically, Gn 



for large n. The probability that a 



tie ever occurs, is less than one, hence ties are not certain. 
This exit probability decreases as the initial discrepancy 
in the number of balls increases. Moreover, there is a 
universal scaling behavior when the total initial popu- 
lation is very large. This scaling behavior implies that 
the exit probability is appreciable only when the initial 
population imbalance is of the order of the square-root 
of the total population. 

The key property of the Polya urn model is that the 
fraction of white balls approaches a limiting value, but 
this value fiuctuates from realization-to-realization. In 
many other urn models, however, the opposite is true, 
and moreover, the two fractions approach the same lim- 
itin g va lue. This is the case for the Friedman urn process 
[28l |29| which, in its simplest form, is equivalent to the 
stochastic process 



{B + 1,W) with probability -g^, 
{B, W + 1) with probability -g^. 



It will be interesting to investigate first passage proper- 
tics of this urn process. We conjecture that first passage 
statistics are much closer to those of the ordinary random 
walk. 

In the context of population dynamics and evolution- 
ary biology, continuous time processes are more appropri- 
ate. The continuous time analog of the Polya urn model 
dU) is the two-species branching process, B ^ B + B, and 
W ^ W + W, where the two birth rates are equal. This 
continuous time process is closely related to the discrete 
time urn process. For instance, the total exit probabili- 
ties for the two processes are identical. 
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